MacFormat 1996 March

home *** CD-ROM | disk | FTP | other *** search

/ MacFormat 1996 March / macformat-035.iso / Shareware City / Science / DNA stacks 1.1 / Sample Input⁄Output / Specialized Instructions / Support Index (SI) calculations < prev

Wrap

Text File | 1995-08-10 | 11.4 KB | 274 lines | [TEXT/QED1]

[ The "Support Indices..." option from the "N-out" field menu of DNA Translator's utility cards is a _power user_ feature for automating the calculation of "support index" (SI) values (for and example, see Eernisse, D. J., and A. G. Kluge. 1993. Taxonomic congruence versus total evidence, and the phylogeny of amniotes inferred from fossils, molecules and morphology. Molecular Biology and Evolution 10: 1170-1195.; also referred to by Donaghue and others as "decay index" or by Farris and collaborators as "Bremer Support" values). Quick Instructions: 1. Save a single-most parsimonious or consensus tree in PAUP format. Choose one that you are likely to figure in your publication. The goal is to calculate SI values that you can add to your figure. In either case, record the length of any _minimum length_ trees that PAUP finds. 2. Have your normal PAUP file from this analysis available as well, and make a back-up copy of your original file. 3. Select "Support Indices..." from the "N-out" field menu of DNA Translator's utility cards. If you don't see the small "N-out" menu, you may have to first select "Show Field Menus" from the View menu. 4. You will be prompted for your tree file (see step 1). 5. You will be given the option of appending the SI search blocks to your current PAUP file (see step 2). 6. You probably don't need to save tree files for each SI search, even though you will be given the option to do so. 7. Execute your modified PAUP file. This will automatically perform all searches and save the results to a log file. You are doing a "converse constraint" search for every node of every tree in your tree file (note that the PAUP can display a "node" separated from a single outgroup if you set this option, but this is not a node that you will want to calculate an SI for). 8. Open the created log file from PAUP. Subtract the minimum length from your original unconstrained analysis (L1) with the minimum length from each converse constraint analysis (L2). This gives you an SI value. For example, if L2= 320 and L1=317 then SI=3 and this means that the shortest tree that doesn't contain the clade for the node in question is three steps longer than the minimum length tree that does keep those taxa together. SI values vs. Bootstrap values: A high SI generally corresponds to a high bootstrap value (with some infrequent exceptions) but: 1. SI calculations are based on exactly the same data set as your parsimony search (i.e., they are based on all available evidence). 2. SI calculations have a more direct and intuitive relationship to the "robustness" of nodes. 3. SI calculations do not confound statistical support for the reality of a particular tree with a separate issue of how likely the same tree would be obtained if you had more characters from the same universe of characters, whether or not it is the "true" tree. 4. Some of the assumptions of a bootstrap analysis are frequently violated, for example, it depends on the i.i.d. assumptions identified by Felsenstein and others, which require that the characters be identically and independently distributed. Sanderson has separated these assumptions into two less restrictive assumptions, namely, that characters are independent, and that the observed charcter set is a "representative" sample of the "universe of characters" (paraphrased from PAUP 3.1 manual, p. 56). 5. Whether or not SI values differ from bootstrap analyses in overcoming systematic biases such as "long branch effects" has not been exhaustively explored, but it might be at least advisable to try both, rather than limit your estimates of node robustness to bootstrap values alone (the most common practice). Overview: There are generally two approaches that have been used to calculate SI values. First, one can do a search for one or more minimum length trees, followed by a series of analyses keeping all trees of ever-increasing length. Here is my own example of PAUP blocks to automate such searches: Step 1 ^^^^^^ ] [Normal Nexus data matrix goes here] begin paup; charset mol = 275-747; [I happened to exclude some characters here] taxset dupmol = 1-17; [I happened to delete some taxa here] exclude mol; delete fish1; delete dupmol; log file='morphMS.log 3/18/93' append; [log results to a file!] [!Total evidence analysis, Eernisse & Kluge ] [!informative sites only, gapmode = missing] [!keeping all trees of min. length] HSEARCH ADDSEQ=RANDOM NREPS=10 KEEP=0; savetrees file='morph+0_MS.trees'; describe; [note the length of first min. length tree] contree /strict save file = 'morph+0_MS.cons' replace; log stop; endblock; [At this point, it is necessary to stop the analysis and execute the following paup block with the appropriate "KEEP=" values entered below. In my case, the min. length trees I found in the above analysis had Length=461. You can put all this in one file and comment out the below or above paup blocks for the first or second steps, respectively. ] [ Step 2 ^^^^^^ ] begin paup; log file='morphMS.log 3/18/93' append; [!Total evidence analysis, Eernisse & Kluge ] [!keeping all trees of min. length plus 1 step] HSEARCH ADDSEQ=RANDOM NREPS=10 KEEP=462; [i.e., 461+1] savetrees file='morph+1_MS.trees' replace; describe; contree /strict save file = 'morph+1_MS.cons' replace; [!Total evidence analysis, Eernisse & Kluge ] [!keeping all trees of min. length plus 2 steps] HSEARCH ADDSEQ=RANDOM NREPS=10 KEEP=463; savetrees file='morph+2_MS.trees' replace; describe; contree /strict save file = 'morph+2_MS.cons' replace; [!Total evidence analysis, Eernisse & Kluge ] [!keeping all trees of min. length plus 3 steps] HSEARCH ADDSEQ=RANDOM NREPS=10 KEEP=464; savetrees file='morph+3_MS.trees' replace; describe; contree /strict save file = 'morph+3_MS.cons' replace; log stop; endblock; [ The above searches will produce (among other things) four strict consensus trees of all trees of min. length plus 0, 1, 2, and 3 steps, respectively. You can examine these consensus trees of decreasing resolution and thereby assess the number of steps required before a node is no longer supported. For example, if a node is in the first three consensus trees but not the last, then the node has an S.I. value of 3 (any hypothesis found which does not include that node is at least 3 steps longer than the minimum length hypothesis). The problem with this approach becomes obvious as one attempts to increase the "KEEP" value. Not surprisingly, one finds an increasingly large number of trees, so that eventually it becomes very time-consuming to swap on all found trees. It was not feasible to calculate S.I. values > 4 in the above analysis. Step 3 ^^^^^^ Another approach is to use PAUP's converse constraint searches to search node by node for the shortest trees that fail to satisfy a constraint corresponding to each respective node. That is where (finally) the "Extract Nodes" utility comes in. Do the following: 1. Perform the first search block above using PAUP. 2. On a DNA Translator "utility" card, make sure that "Field Menus" are shown (View menu; this feature is not yet available from the Convert menu -- sorry!). 3. Pull down the "N-str" menu, and select "Import Sequences" and "Disk" (Yes, I know this isn't intuitive -- I putting off a revamping of the menus until the next major revision). 4. Choose (from the open dialog) the consensus file from your min. length + 0 steps search (e.g., "morph+0_MS.cons"). The tree file should be imported to the "N-str" field. 5. Select "Export as" and "Extract Nodes" from the "N-str" menu. Specify the name and location of the output from the save dialog. You will also be asked to choose from some alternative options, to choose the kind of search (branch and bound or heuristic). If heuristic, simple or random addition sequence searches (the former is not recom- mended). If random addition, then the number of replicates must be specified (the default value of 10 may be too low because PAUP is less efficient when converse constraints are imposed -- use "Test Heuristics" to determine the effectiveness of each replicate). 6. Examine the output in PAUP. You may want to comment out some of the analysis blocks if you already did the "Step 2" searches above. Here are parts of an example output file: ] begin assumptions; [Nodes for tree strict (tree 1)] taxset t1_n2= DISCO PELOB PLETH AMPHI CAEC1 CAEC2 ICHTH SIREN AMBYS TYPHL SOOGL LEPTO MICRO BUFON HYLID PIPID TURTL CAPTORHI SNAKE LIZAR CROCO PASSE GALLI PROTEROC ERYTHROS PROTEROS PROTOROS CHORISTO TRYLOPHO RHYNCHOS ARAEOSCE MOUSE RATTU HUMAN RABBI MORGANUC TRITYLOD EXAERETO DIADEMOD THRINAXO PROCYNOS THEROCEP DICYNODO GORGONOP DINOCEPH BIARMOSU SPHENACO EDAPHOSA OPHIACOD CASEA; taxset t1_n3= DISCO PELOB PLETH AMPHI CAEC1 CAEC2 ICHTH SIREN AMBYS TYPHL SOOGL LEPTO MICRO BUFON HYLID PIPID; taxset t1_n4= DISCO PELOB PLETH AMPHI CAEC1 CAEC2 ICHTH SIREN AMBYS TYPHL SOOGL LEPTO MICRO BUFON HYLID; taxset t1_n5=DISCO PELOB PLETH AMPHI CAEC1 CAEC2 ICHTH SIREN AMBYS TYPHL; taxset t1_n6= PELOB PLETH AMPHI CAEC1 CAEC2 ICHTH; taxset t1_n7=PELOB PLETH; taxset t1_n8= AMPHI CAEC1 CAEC2 ICHTH; [etc.] taxset t1_n46= MOUSE RATTU HUMAN RABBI; taxset t1_n47= MOUSE RATTU HUMAN; taxset t1_n48=MOUSE RATTU; [multiple trees are allowed as input, so tree 2 would start at t2_n2, and so on.] endblock; begin paup; [Constraints for nodes in tree strict (tree 1)] constraints ct1_n2=((t1_n2)); constraints ct1_n3=((t1_n3)); constraints ct1_n4=((t1_n4)); constraints ct1_n5=((t1_n5)); constraints ct1_n6=((t1_n6)); constraints ct1_n7=((t1_n7)); constraints ct1_n8=((t1_n8)); [etc.] constraints ct1_n46=((t1_n46)); constraints ct1_n47=((t1_n47)); constraints ct1_n48=((t1_n48)); endblock; begin paup; log file = 'si_calcs.log 5/20/93' append; [Constraint search blocks created by DNA Translator stack on 5/20/93, 11:23 AM] [!Following search uses ct1_n2 constraint] showconstr ct1_n2; HSEARCH ADDSEQ=RANDOM NREPS=10 enforce converse constraints = ct1_n2; savetrees file='ct1_n2.trees 5/20/93' replace; describe; [!Following search uses ct1_n3 constraint] showconstr ct1_n3; HSEARCH ADDSEQ=RANDOM NREPS=10 enforce converse constraints = ct1_n3; savetrees file='ct1_n3.trees 5/20/93' replace; describe; [etc.] log stop; endblock; [ The resulting log file can be examined for the SI value corresponding to each node. Saving tree files is optional. If "no save" option is specified, the output will lack the "savetrees..." command. If you want to do additional replicate searches then be sure to add "RSEED=xxxxxxx" after "HSEARCH" and before ";" where xxxxxxx is any integer not previously used (e.g., "HSEARCH ADDSEQ=RANDOM NREPS=10 RSEED=93475028 enforce converse constraints = ct1_n2;"). Otherwise, your first search of each start of PAUP will be reset to PAUP's default of RSEED=1, and you risk doing the same exact searches all over again. ] [ As always, I would like to hear from you and please acknowledge my stack (Eernisse, D. J. 1992. DNA Translator and Aligner: HyperCard utilities to aid phylogenetic analysis of molecules. CABIOS 8:177-184) and/or the above Eernisse and Kluge reference if you publish SI values with its help. DJE 6/1/93 (updated 3/24/95) deernisse@fullerton.edu ]